Evaluating Concrete Strength Model Performance

Using Cross-validation Methods

Sai Devarashetty, Mattick, Musson, Perez

2024-07-29

Introduction To Crossvalidation

  • Measure performance and generalizability of machine learning and predictive models.
  • Compare different models constructed from the same data set.

CV widely used in various fields including:

  • Machine Learning
  • Data Mining
  • Bioinformatics
  • Minimize overfitting
  • Ensure a model generalizes to unseen data
  • Tune hyperparameters

Definitions

Generalizability:
How well predictive models created from a sample fit other samples from the same population.

Overfitting:
When a model fits the the underlying patterns of the training data too well.

Model fits characteristics specific to the training set:

  • Noise
  • Random fluctuations
  • Outliers

Hyperparameters:
Are model configuration variables

Nodes and layers in a neural network

Branches in a decision tree

Process

Subsets the data into K approximately equally sized folds

  • Randomly
  • Without replacement

(Song, Tang, and Wee 2021)

Split The Subsets into test and training sets

  • 1 test set
  • K-1 training set

  • Fit the model to the training data
  • Apply the fitted model to the test set
  • Measure the prediction error

Repeat K Times

  • Fit to all K-1 combinations
  • Test with each subset 1 time

Calculate the mean error

Bias-Variance Trade-Off

 

K-Fold vs. LOOCV
Method Computation Bias Variance
K-Fold Lower Intermediate Lower
LOOCV Highest Unbiased High

K-fold where K = 5 or K = 10 is recommended:

  • Lowe computational cost
  • Does not show excessive bias
  • Does not show excessive variance

(James et al. 2013), (Gorriz et al. 2024)

 

Model Measures of Error (MOE)

  • Measure the quality of fit of a model
  • Measuring error is a critical data modeling step
  • Different MOE for different data types

By measuring the quality of fit we can select the model that Generalizes best.

\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{f}(x_i)| \tag{1} \]

  • A measure of error magnitude
  • The sign does not matter - absolute value
  • Lower magnitude indicates better fit
  • Take the mean absolute difference between:
    • observed \((y_i)\) and the predicted \(\hat{f}(x_i)\) values
  • \(n\) is the number of observations,
  • \(\hat{f}(x_i)\) is the model prediction \(\hat{f}\) for the ith observation
  • \(y_i\) is the observed value

\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{f}(x_i))^2} \tag{2} \]

  • A measure of error magnitude
  • Lower magnitude indicates better fit
  • Error is weighted
    • Squaring the error give more weight to the larger ones
    • Taking the square root returns the error to the same units as the response variable

\[ \text{R}^2 = \frac{SS_{tot}-SS_{res}}{SS_{tot}} = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{f}(x_i))^2}{\sum_{i=1}^{n}(y_i-\bar{f}(x_i))^2} \tag{3} \]

  • Proportion of the variance explained by the predictor(s)
  • Higher value means better the fit
    • An \(R^2\) value of 0.75 indicates 75% of the variance in the response variable is explained by the predictor(s)

(James et al. 2013), (Hawkins, Basak, and Mills 2003), (Helsel and Hirsch 1993)

k-Fold Cross-Validation

\[ CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k} \text{Measuer of Errori}_i \tag{4} \]

(James et al. 2013),(Browne 2000)

Leave One Out Cross-validations (LOOCV)

\[ CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n} \text{Measuer of Errori}_i \tag{5} \]

(James et al. 2013),(Browne 2000)

Nested Cross-Validation

(Berrar et al. 2019)

Study Data

(I-C Yeh 1998) modeled compression strength of high performance concrete (HPC) at various ages and made with different ratios of components. The data used for their study was made publicly available and can be downloaded UCI Machine Learning Repository (I-Cheng Yeh 2007).

Data Exploration and Visualization

  • Target variable:
    • Strength (MPa)
  • Predictor variables:
    • Cement (kg/m3)
    • Superplasticizer (kg/m3)
    • Age (days)
    • Water (kg/m3)

All variables are quantitative

Linear Regression Model

Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2578655 5.1878634 5.446918 1.0e-07
Cement 0.0668433 0.0039668 16.850539 0.0e+00
Superplasticizer 0.8716897 0.0903825 9.644449 0.0e+00
Age 0.1110466 0.0069538 15.969235 0.0e+00
Water -0.1195600 0.0257210 -4.648334 3.9e-06

\[ \hat{Strength} = 28.258_\text{Cement + } 0.067_\text{Superplasticizer + } 0.872_\text{Age } 0.111_\text{Water} \]

Linear Regression CV Results

  • k-Fold Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • LOOCV Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • Nested CV Results:
Measure of Error Result
RMSE 11.87
MAE 9.43
R2 0.49

LightGBM Model

 

Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

 


  • Ensemble of decision trees
  • Uses gradient boosting
  • Final prediction is the sum of predictions from all individual trees
  • Feature importance

LightGBM CV Results

  • k-Fold Results:
Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

  • LOOCV Results:
Measure of Error Result
RMSE 5.93
MAE 4.32
R2 0.87

  • Nested CV Results:
Measure of Error Result
RMSE 8.27
MAE 6.39
R2 0.75

Comparison of Models

  • Performance Comparison:
      Linear Regression vs. LightGBM
  • Advantages and disadvantages
     of each model
Method Measure of Error Linear Regression LightGBM
5-Fold RMSE 12.13 8.73
5-Fold MAE 9.23 6.82
5-Fold R2 0.46 0.73
LOOCV RMSE 12.13 5.93
LOOCV MAE 9.23 4.32
LOOCV R2 0.46 0.87
NCV RMSE 11.87 8.27
NCV MAE 9.43 6.39
NCV R2 0.49 0.75

Model Comparison k-Fold Plot

Model Comparison LOOCV Plot

Model Comparison Nested CV Plot

Conclusion: Overview

  • Evaluation of Two Models:
    • Linear Regression Model
    • LightGBM Model

  • Cross-Validation Methods Used:
    • k-fold Cross-Validation
    • Leave-One-Out Cross-Validation (LOOCV)
    • Nested Cross-Validation

Conclusion: Key Findings

  • Model Performance:
    • LightGBM consistently outperformed Linear Regression
    • Linear Regression provided baseline insights into linear relationships

  • Cross-Validation Insights:
    • k-fold CV showed LightGBM’s superior generalization
    • LOOCV confirmed robustness across individual data points
    • Nested CV mitigated overfitting, ensuring genuine predictive power

Conclusion: Implications and Future Directions

  • Implications for Future Research:
    • Importance of advanced cross-validation techniques
    • Enhancing model validation processes
    • Ensuring model generalizability and reliability across various applications

  • Future Directions:
    • Continuous refinement of cross-validation methods
    • Exploration of implications in different predictive modeling scenarios
    • Development of robust predictive models through improved validation processes

References

All figures were created by the authors

Berrar, Daniel et al. 2019. “Cross-Validation.”
Browne, Michael W. 2000. “Cross-Validation Methods.” Journal of Mathematical Psychology 44 (1): 108–32.
Gorriz, Juan M, Fermı́n Segovia, Javier Ramirez, Andrés Ortiz, and John Suckling. 2024. “Is k-Fold Cross Validation the Best Model Selection Method for Machine Learning?” arXiv Preprint arXiv:2401.16407.
Hawkins, Douglas M, Subhash C Basak, and Denise Mills. 2003. “Assessing Model Fit by Cross-Validation.” Journal of Chemical Information and Computer Sciences 43 (2): 579–86.
Helsel, Dennis R, and Robert M Hirsch. 1993. Statistical Methods in Water Resources. Elsevier.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.
Song, Q Chelsea, Chen Tang, and Serena Wee. 2021. “Making Sense of Model Generalizability: A Tutorial on Cross-Validation in r and Shiny.” Advances in Methods and Practices in Psychological Science 4 (1): 2515245920947067.
Yeh, I-C. 1998. “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete Research 28 (12): 1797–1808.
Yeh, I-Cheng. 2007. Concrete Compressive Strength.” UCI Machine Learning Repository.